Search CORE

140 research outputs found

Statistical Augmentation of a Chinese Machine-Readable Dictionary

Author: Fung Pascale
Wu Dekai
Publication venue
Publication date: 01/01/1994
Field of study

We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.Comment: 17 pages, uuencoded compressed PostScrip

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

SRL for low resource languages isn’t needed for semantic SMT

Author: Beloucif Meriem
Wu Dekai
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

Previous attempts at injecting semantic frame biases into SMT training for low resource languages failed because either (a) no semantic parser is available for the low resource input language; or (b) the output English language semantic parses excise relevant parts of the alignment space too aggressively. We present the first semantic SMT model to succeed in significantly improving translation quality across many low resource input languages for which no automatic SRL is available —consistently and across all common MT metrics. The results we report are the best by far to date for this type of approach; our analyses suggest that in general, easier approaches toward including semantics in training SMT models may be more feasible than generally assumed even for low resource languages where semantic parsers remain scarce. While recent proposals to use the crosslingual evaluation metric XMEANT during inversion transduction grammar (ITG) induction are inapplicable to low resource languages that lack semantic parsers, we break the bottleneck via a vastly improved method of biasing ITG induction toward learning more semantically correct alignments using the monolingual semantic evaluation metric MEANT. Unlike XMEANT, MEANT requires only a readily-available English (output language) semantic parser. The advances we report here exploit the novel realization that MEANT represents an excellent way to semantically bias expectation-maximization induction even for low resource languages. We test our systems on challenging languages including Amharic, Uyghur, Tigrinya and Oromo. Results show that our model influences the learning towards more semantically correct alignments, leading to better translation quality than both the standard ITG or GIZA++ based SMT training models on different datasets.This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under LORELEI contract HR0011-15-C-0114, BOLT contracts HR0011-12-C-0014 and HR0011-12-C-0016, and GALE contracts HR0011-06-C-0022 and HR0011-06-C-0023; by the European Union under the Horizon 2020 grant agreement 645452 (QT21) and FP7 grant agreement 287658; and by the Hong Kong Research Grants Council (RGC) research grants GRF16210714, GRF16214315, GRF620811 and GRF621008

Repositorio Institucional de la Universidad de Alicante

Accuracy and robustness in measuring the lexical similarity of semantic role fillers for automatic semantic MT evaluation

Author: Lo Chi-kiu
Tumuluru Anand Karthik
Wu Dekai
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Lightweight Self-Forming Super-Elastic Mechanical Metamaterials with Adaptive Stiffness

Author: Deng Zongquan
Diver Carl
Li Longqiu
Lyu Shida
Roberts Peter C. E.
Soutis Constantinos
Wu Rui
Zheng Fei
Zhou Dekai
Publication venue: 'Wiley'
Publication date: 03/11/2020
Field of study

Scarcity of stiff, yet compliant materials is a major obstacle toward biological-like mechanical systems that perform precise manipulations while being resilient under excessive load. We introduce a macroscopic cellular structure comprising of two pre-stressed elastic “phases”, which displays a load-sensitive stiffness that drops by 30 times upon a “pseudo-ductile transformation” and accommodates a fully-recoverable compression of over 60%. This provides an exceptional 20 times more deform-ability beyond the linear-elastic regime, doubling the capability of previously reported super-elastic materials. In virtue of the pre-stressing process based on thermal-shrinkage, it simultaneously enables a heat-activated self-formation that transforms a flat laminate into the metamaterial with 50 times volumetric growth. The metamaterial is thereby inherently lightweight with a bulk density in the order of 0.01 g cm−3, which is one order of magnitude lower than existing super-elastic materials. Besides the highly-programmable geometrical and mechanical characteristics, this paper is the first to present a method that generates single-crystal or poly-crystal-like 3D lattices with anisotropic or isotropic super-elasticity. This pre-stress-induced adaptive stiffness with high deform-ability could be a step toward in-situ deployed ultra-lightweight mechanical systems with a diverse range of applications that benefit from being stiff and compliant

Crossref

E-space: Manchester Metropolitan University's Research Repository

The University of Manchester - Institutional Repository

Pushdown automata in statistical machine translation

Author: Adrià de Gispert
Aho Alfred V.
Bar-Hillel Y.
Bill Byrne
Blackwood Graeme
Brants Thorsten
Chang Yin-Wen
Chelba Ciprian
Cyril Allauzen
Dyer Chris
Gonzalo Iglesias
Hopkins M.
Huang Liang
Huang Liang
Huang Liang
Koo Terry
Kumar Shankar
Ljolje Andrej
Michael Riley
Mohri Mehryar
Nederhof Mark-Jan
Roark Brian
Roark Brian
Rush Alexander M.
Stolcke Andreas
Stolcke Andreas
Wu Dekai
Zens Richard
Publication venue: Computational Linguistics
Publication date: 01/01/2013
Field of study

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p

CiteSeerX

Crossref

Apollo (Cambridge)

Analysis of injury death trends among women in Macheng City, China, 1984-2008

Author: A Bergland
CK Yan
CM Rebholz
Dekai Zhang
EG Krug
G Hu
GH Li
GH Yang
J Zhang
JA Stevens
JH Ni
L Zhou
Li Wu
LJ Zhou
Ministry of Health of China
MR Phillips
NM Peel
P Qin
PS Yip
S Law
S Moniruzzaman
SB Sorenson
SY Wang
TX Liu
WH Cao
X Zhang
Xiang Yu
Xiaoxian Liu
Yang Hu
YB Yang
Youjie Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Machine Translation with a Stochastic Grammatical Channel

Author: Dekai Wu
Dekai Wu And
Hongsing Wong
Publication venue
Publication date: 01/01/1998
Field of study

We introduce a stochastic grammatical channel model for machine translation, that synthesizes several desirable characteristics of both statistical and grammatical machine translation. As with the pure statistical translation model described by Wu (1996) (in which a bracketing transduction grammar models the channel), alternative hypotheses compete probabilistically, exhaustive search of the translation hypothesis space can be performed in polynomial time, and robustness heuristics arise naturally from a language-independent inversiontransduction model. However, unlike pure statistical translation models, the generated output string is guaranteed to conform to a given target grammar. The model employs only (1) a translation lexicon, (2) a context-free grammar for the target language, and (3) a bigram language model. The fact that no explicit bilingual translation rules are used makes the model easily portable to a variety of source languages. Initial experiments show that it also achieves significant speed gains over our earlier model

CiteSeerX